Communication and cooling aware job allocation in data centers for communication-intensive workloads
نویسندگان
چکیده
Energy consumption is an increasingly important concern in data centers. Today, nearly half of the energy in data centers is consumed by the cooling infrastructure. Existing policies on thermally-aware workload allocation do not consider applications that include many tasks (or threads) running on a large set of nodes with significant communication among the tasks. Such jobs, however, constitute most of the cycles in high performance computing (HPC) domain, and have started to appear in other data centers as well. Job allocation strongly affects the performance of such communication-intensive applications. Communication-aware job allocation methods exist, but they focus solely on performance and do not consider cooling energy. This paper proposes a novel job allocation methodology to jointly minimize communication cost and cooling energy consumption in data centers. We formulate and solve the joint optimization problem using binary quadratic programming. Our joint optimization algorithm reduces cooling energy by 16.4% on average with only a 2.66% average increase in application running time compared to solely performance-aware allocations. To further optimize the communication cost, we develop a Charm++ based framework that extracts the communication behavior of applications. We then integrate our job allocation policy with recursive coordinate bisection (RCB) based task mapping method to place highly-communicating tasks in close proximity. Experimental results show that task mapping further decreases the communication cost by up to 20.9% compared to assuming all-to-all communication, a popular assumption in much of the prior work. © 2016 Elsevier Inc. All rights reserved.
منابع مشابه
Simulation and Optimization of HPC Job Allocation for Jointly Reducing Communication and Cooling Costs
Performance and energy are critical aspects in high performance computing (HPC) data centers. Highly parallel HPC applications that require multiple nodes usually run for long durations in the range of minutes, hours or days. As the threads of parallel applications communicate with each other intensively, the communication cost of these applications has a significant impact on data center perfo...
متن کاملJob Scheduling that Minimizes Network Contention due to both Communication and I/O
As communication and I/O traffic increase on the interconnection network of high-performance systems, network contention becomes a critical problem drastically reducing performance. Whereas earlier allocation strategies were either sensitive to communication alone or sensitive to I/O alone, we present a new strategy that is sensitive to both communication and I/O. Our new strategy, MC-Elongated...
متن کاملAdia: Achieving High Link Utilization with Coflow-Aware Scheduling in Data Center Networks
Link utilization has received extensive attention since data centers become the most pervasive platform for data-parallel applications. A specific job of such applications involves communication among multiple machines. The recently proposed coflow abstraction depicts such communication through a group of parallel flows, and captures application performance through corresponding communication r...
متن کاملEnergy Aware Resource Management of Cloud Data Centers
Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Virtualization technology forms a key concept for new cloud computing architectures. The data centers are used to provide cloud services burdening a significant...
متن کاملTopology-Aware Resource Allocation for Data-Intensive Workloads pdfsubject
This paper proposes an architecture for optimized resource allocation in Infrastructure-as-a-Service (IaaS)-based cloud systems. Current IaaS systems are usually unaware of the hosted application’s requirements and therefore allocate resources independently of its needs, which can significantly impact performance for distributed data-intensive applications. To address this resource allocation p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 96 شماره
صفحات -
تاریخ انتشار 2016